Incremental Generalized Eigenvalue Classification on Data Streams
نویسندگان
چکیده
As applications on massive data sets are emerging with an increasing frequency, we are facing the problem of analyzing the data as soon as they are produced. This is true in many fields of science and engineering: in high energy physics, experiments have been done to transfer data at a sustained rate of 150 gigabits per second. In Y2007, that speed will enable the delivery to users of data continuously produced by the LHC particle accelerator located at CERN. Other examples can be found in network traffic analysis, telecommunications data mining, discrimination of data from sensors that monitor pollution and biological hazards, video and audio surveillance. In all cases, computational procedures have to deal with a large amount of data that are delivered in form of data streams. Traditional data mining techniques assume that the dataset is static and, to increment knowledge, random samples are extracted from the dataset. In this study, we use Incremental Regularized Generalized Eigenvalue Classification (I-ReGEC), a supervised learning algorithm, to continuously train a classification model from a data stream. The advantage of this technique is that the classification model can be update incrementally. The algorithm online decides which are the points that contain new information and updates the available classification model. We show through numerical experiments, on a synthetic dataset, the method performance, highlighting its behavior with respect to the number of incremental training set, the accuracy classification and the throughput of the data stream.
منابع مشابه
Incremental Classification with Generalized Eigenvalues
Supervised learning techniques are widely accepted methods to analyze data for scientific and real world problems. Most of these problems require fast and continuous acquisition of data, which are to be used in training the learning system. Therefore, maintaining such systems updated may become cumbersome. Various techniques have been devised in the field of machine learning to solve this probl...
متن کاملAn online generalized eigenvalue version of Laplacian Eigenmaps for visual big data
This paper presents a novel online version of laplacian eigenmap termed as generalized incremental laplacian eigenmap (GENILE), one of the most popular manifold-based dimensionality reduction technique performed by solving the generalized eigenvalue problem. We have used swiss roll and s-curve dataset, the most popular datasets used for manifold-based learning techniques, in this paper as artif...
متن کاملAn Adaptive Nearest Neighbor Classification Algorithm for Data Streams
In this paper, we propose an incremental classification algorithm which uses a multi-resolution data representation to find adaptive nearest neighbors of a test point. The algorithm achieves excellent performance by using small classifier ensembles where approximation error bounds are guaranteed for each ensemble size. The very low update cost of our incremental classifier makes it highly suita...
متن کاملA mathematically simple method based on denition for computing eigenvalues, generalized eigenvalues and quadratic eigenvalues of matrices
In this paper, a fundamentally new method, based on the denition, is introduced for numerical computation of eigenvalues, generalized eigenvalues and quadratic eigenvalues of matrices. Some examples are provided to show the accuracy and reliability of the proposed method. It is shown that the proposed method gives other sequences than that of existing methods but they still are convergent to th...
متن کاملOn the Utility of Incremental Feature Selection for the Classification of Textual Data Streams
In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007